Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Demo mode #2523

Closed
wants to merge 33 commits into from
Closed

feat: Demo mode #2523

wants to merge 33 commits into from

Conversation

Matvey-Kuk
Copy link
Contributor

@Matvey-Kuk Matvey-Kuk commented Nov 18, 2024

LIVE_DEMO_MODE will launch what uset to be simulate_alert.py in the background

Closes #2524

✅ Closed in favor of #2561

Summary by CodeRabbit

  • New Features

    • Introduced a new environment variable KEEP_LIVE_DEMO_MODE for simulating alerts in demo mode.
    • Added functionality for simulating alerts through the enhanced demo_mode_runner module, including structured correlation rules and a robust service topology.
  • Bug Fixes

    • Enhanced alert configurations across multiple providers to improve specificity and organization.
  • Documentation

    • Updated deployment documentation to include the new KEEP_LIVE_DEMO_MODE environment variable.
  • Chores

    • Simplified alert simulation logic by integrating with the new demo mode functionality.

Copy link

vercel bot commented Nov 18, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Skipped Deployment
Name Status Preview Comments Updated (UTC)
keep ⬜️ Ignored (Inspect) Visit Preview Nov 20, 2024 9:04am

@dosubot dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Nov 18, 2024
@Matvey-Kuk Matvey-Kuk changed the title Demo mode feat: Demo mode Nov 18, 2024
@dosubot dosubot bot added Documentation Improvements or additions to documentation Feature A new feature labels Nov 18, 2024
@Matvey-Kuk Matvey-Kuk marked this pull request as draft November 18, 2024 11:42
@Matvey-Kuk Matvey-Kuk marked this pull request as ready for review November 18, 2024 14:52
@dosubot dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. and removed size:L This PR changes 100-499 lines, ignoring generated files. labels Nov 18, 2024
keep/api/config.py Outdated Show resolved Hide resolved
keep/api/config.py Outdated Show resolved Hide resolved
Matvey-Kuk and others added 2 commits November 18, 2024 18:11
Co-authored-by: Tal <[email protected]>
Signed-off-by: Matvey Kukuy <[email protected]>
Co-authored-by: Tal <[email protected]>
Signed-off-by: Matvey Kukuy <[email protected]>
logger.info("Demo mode launched.")

keep_api_url = "http://localhost:" + str(os.environ.get("PORT", 8080))
keep_api_key = os.environ.get("KEEP_API_KEY")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
keep_api_key = os.environ.get("KEEP_API_KEY")
keep_api_key = os.environ.get("KEEP_READ_ONLY_BYPASS_KEY")

Copy link

coderabbitai bot commented Nov 19, 2024

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

The changes in this pull request introduce a new environment variable KEEP_LIVE_DEMO_MODE to the Keep application, allowing for a demo mode that simulates alerts. The configuration documentation has been updated accordingly. The keep/api/config.py file has been modified to include a new global variable LIVE_DEMO_MODE and a function to manage demo mode activation. Additionally, various alert mock files across different providers have been updated to enhance alert configurations.

Changes

File Path Change Summary
docs/deployment/configuration.mdx Added new environment variable KEEP_LIVE_DEMO_MODE for demo mode configuration.
keep/api/config.py Introduced global variable LIVE_DEMO_MODE and updated on_starting function for demo mode management.
keep/api/core/demo_mode_runner.py Added functions for managing alerts and demo mode: get_or_create_topology, get_or_create_correlation_rules, remove_old_incidents, simulate_alerts, launch_demo_mode.
keep/providers/cloudwatch_provider/alerts_mock.py Updated ALERTS dictionary for high_cpu_usage alert with new alarm names.
keep/providers/datadog_provider/alerts_mock.py Modified tags for high_cpu_usage and low_disk_space alerts to include service:api.
keep/providers/grafana_provider/alerts_mock.py Added service key to database_connection_failure, high_memory_usage, and network_latency_high alerts.
keep/providers/prometheus_provider/alerts_mock.py Updated labels.service for multiple alerts and added customer_id to mq_third_full alert.
scripts/simulate_alerts.py Simplified alert simulation by calling simulate_alerts function from demo_mode_runner.

Assessment against linked issues

Objective Addressed Explanation
Implement demo mode functionality

🐰 In a world where alerts do play,
A demo mode brightens the day.
With variables new and logic refined,
Simulating alerts, oh how well aligned!
Hop along, let the testing commence,
In our lively demo, there's no pretense! 🐇✨


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Outside diff range and nitpick comments (10)
keep/providers/cloudwatch_provider/alerts_mock.py (1)

14-14: Consider namespace consistency with pod-related alerts.

The new alarm names are more descriptive and better reflect real-world scenarios. However, "PodRecycled" seems inconsistent with the AWS/EC2 namespace in the alert payload. If these alerts are meant to simulate Kubernetes pod scenarios, consider using a more appropriate namespace (e.g., ContainerInsights).

Consider this modification:

-            "Message.AlarmName": ["HighCPUUsage", "HighCPUUsageOnAPod", "PodRecycled"],
+            "Message.AlarmName": ["HighCPUUsage", "HighCPUUsageOnInstance", "InstanceRecycled"],

Or update the namespace if these are meant to be Kubernetes-related alerts:

             "MetricName": "CPUUtilization",
-            "Namespace": "AWS/EC2",
+            "Namespace": "ContainerInsights",
scripts/simulate_alerts.py (1)

Line range hint 1-31: Well-structured refactoring of alert simulation

The changes effectively:

  • Simplify the script by moving complex logic to a dedicated module
  • Provide configuration flexibility through environment variables
  • Follow separation of concerns principle

This aligns well with the PR objective of implementing demo mode.

keep/providers/datadog_provider/alerts_mock.py (1)

Line range hint 1-38: Consider documenting the alert variation logic

Since this mock data will be used by the new demo mode feature, it would be helpful to add comments explaining:

  1. The purpose of parameter variations
  2. How the demo mode selects between different parameter options
  3. The relationship between payload defaults and parameter variations

This documentation would help maintainers understand the demo mode behavior.

keep/providers/prometheus_provider/alerts_mock.py (2)

23-23: Consider making customer_id configurable for demo flexibility

While "acme" is a good default, consider making the customer_id a parameter for more flexible demos. This would allow demonstrating multi-tenant scenarios more effectively.

"parameters": {
    "labels.queue": ["queue1", "queue2", "queue3"],
    "labels.service": ["calendar-producer-java-otel-api-dd", "kafka", "queue"],
    "labels.mq_manager": ["mq_manager1", "mq_manager2", "mq_manager3"],
+   "labels.customer_id": ["acme", "demo-corp", "test-org"],
},

Also applies to: 28-28


14-14: Consider reducing service list duplication

The same service list appears in multiple alerts. Consider defining it once as a constant to improve maintainability.

+COMMON_SERVICES = ["calendar-producer-java-otel-api-dd", "kafka", "api", "queue", "db"]

 ALERTS = {
     "high_cpu_usage": {
         # ...
         "parameters": {
-            "labels.service": ["calendar-producer-java-otel-api-dd", "kafka", "api", "queue", "db"],
+            "labels.service": COMMON_SERVICES,

Also applies to: 41-41, 54-54

keep/api/config.py (1)

62-73: Ensure idempotency of when_ready function.

The function should be idempotent as it might be called multiple times by the server. Consider adding a guard to prevent multiple initializations of demo mode.

+_demo_mode_initialized = False

 def when_ready(server):
     """This function is called by the gunicorn server when it is ready to accept requests"""
     logger.info("Keep server ready")

     launch_uptime_reporting()

-    if LIVE_DEMO_MODE:
+    global _demo_mode_initialized
+    if LIVE_DEMO_MODE and not _demo_mode_initialized:
         logger.info("Launching Keep in demo mode.")
         try:
             from keep.api.core.demo_mode_runner import launch_demo_mode
             launch_demo_mode()
+            _demo_mode_initialized = True
         except Exception as e:
             logger.error(f"Failed to launch demo mode: {e}")
     else:
         logger.info("Not launching Keep in demo mode.")
keep/providers/grafana_provider/alerts_mock.py (1)

Line range hint 4-97: Consider establishing service categorization standards

The addition of service categorization is a good enhancement, but there seems to be no clear standard for service naming. Consider:

  1. Documenting the allowed service categories
  2. Establishing naming conventions for services
  3. Adding validation for service values

This would be particularly important for the demo mode feature to ensure consistent and realistic alert simulation.

keep/api/core/demo_mode_runner.py (3)

394-394: Fix the typo in the docstring.

There's a typo in the docstring: "backgound" should be "background".

Apply this diff to correct the typo:

     """
-    Running async demo in the backgound.
+    Running async demo in the background.
     """

399-400: Ensure the default port is correctly set as a string.

When constructing keep_api_url, ensure that the default port is a string to avoid issues during string concatenation.

Apply this diff to set the default port as a string:

 keep_api_url = os.environ.get(
-    "KEEP_API_URL", "http://localhost:" + str(os.environ.get("PORT", 8080))
+    "KEEP_API_URL", "http://localhost:" + os.environ.get("PORT", "8080")
 )

338-390: Add a top-level exception handler in the simulate_alerts loop.

To prevent the simulate_alerts thread from exiting unexpectedly due to unhandled exceptions, consider adding a top-level exception handler around the while True loop. This ensures that any unforeseen errors are logged and the loop continues running.

Apply this diff to add a top-level exception handler:

 def simulate_alerts(
     keep_api_url=None,
     keep_api_key=None,
     sleep_interval=5,
     demo_correlation_rules=False,
     demo_topology=False,
 ):
     logger.info("Simulating alerts...")
     GENERATE_DEDUPLICATIONS = True

     providers = [
         "prometheus",
         "grafana",
         "cloudwatch",
         "datadog",
     ]

     providers_to_randomize_fingerprint_for = [
         "cloudwatch",
         "datadog",
     ]

     provider_classes = {
         provider: ProvidersFactory.get_provider_class(provider)
         for provider in providers
     }

     if demo_correlation_rules:
         logger.info("Creating correlation rules...")
         get_or_create_correlation_rules(keep_api_key, keep_api_url)
         logger.info("Correlation rules created.")
     if demo_topology:
         logger.info("Creating topology...")
         get_or_create_topology(keep_api_key, keep_api_url)
         logger.info("Topology created.")

+    try:
         while True:
             logger.info("Looping to send alerts...")

             logger.info("Removing old incidents...")
             remove_old_incidents(keep_api_key, keep_api_url)
             logger.info("Old incidents removed.")

             # choose provider
             provider_type = random.choice(providers)
             send_alert_url = "{}/alerts/event/{}".format(keep_api_url, provider_type)
             provider = provider_classes[provider_type]
             alert = provider.simulate_alert()

             send_alert_url_params = {}

             if provider_type in providers_to_randomize_fingerprint_for:
                 send_alert_url_params["fingerprint"] = "".join(
                     random.choices("abcdefghijklmnopqrstuvwxyz0123456789", k=10)
                 )

             # Determine number of times to send the same alert
             num_iterations = 1
             if GENERATE_DEDUPLICATIONS:
                 num_iterations = random.randint(1, 3)

             for _ in range(num_iterations):
                 logger.info("Sending alert: {}".format(alert))
                 try:
                     env = random.choice(["production", "staging", "development"])
                     send_alert_url_params["provider_id"] = f"{provider_type}-{env}"
                     prepared_request = PreparedRequest()
                     prepared_request.prepare_url(send_alert_url, send_alert_url_params)
                     response = requests.post(
                         prepared_request.url,
                         headers={"x-api-key": keep_api_key},
                         json=alert,
                     )
                     response.raise_for_status()  # Raise an HTTPError for bad responses
                 except requests.exceptions.RequestException as e:
                     logger.error("Failed to send alert: {}".format(e))
                     time.sleep(sleep_interval)
                     continue

                 if not response.ok:
                     logger.error("Failed to send alert: {}".format(response.text))
                 else:
                     logger.info("Alert sent successfully")

             logger.info(
                 "Sleeping for {} seconds before next iteration".format(sleep_interval)
             )
             time.sleep(sleep_interval)
+    except Exception as e:
+        logger.exception("An unexpected error occurred in simulate_alerts: {}".format(e))
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between b53e502 and 353e42c.

📒 Files selected for processing (8)
  • docs/deployment/configuration.mdx (1 hunks)
  • keep/api/config.py (2 hunks)
  • keep/api/core/demo_mode_runner.py (1 hunks)
  • keep/providers/cloudwatch_provider/alerts_mock.py (1 hunks)
  • keep/providers/datadog_provider/alerts_mock.py (2 hunks)
  • keep/providers/grafana_provider/alerts_mock.py (3 hunks)
  • keep/providers/prometheus_provider/alerts_mock.py (4 hunks)
  • scripts/simulate_alerts.py (2 hunks)
🔇 Additional comments (12)
scripts/simulate_alerts.py (3)

4-4: LGTM: Clean import of demo mode functionality

The import statement properly organizes demo-related functionality into a dedicated module.


21-26: Clarify the purpose of unused correlation rules feature

The demo_correlation_rules parameter is set to False but its purpose isn't clear. Consider:

  • Adding a comment explaining this parameter
  • Documenting if this is for future use
✅ Verification successful

The demo_correlation_rules parameter is a valid feature for creating demo correlation rules

The parameter is used to optionally create predefined correlation rules during alert simulation, as evidenced by:

  • Implementation in keep/api/core/demo_mode_runner.py shows a complete correlation rules feature
  • When enabled, it creates rules like "Message Queue Buckle Up" that groups MQ-related alerts
  • The parameter is intentionally set to False in the simulation script to avoid creating demo rules by default

Consider adding a docstring to explain this optional feature for other developers.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check the implementation of simulate_alerts to understand correlation rules
ast-grep --pattern 'def simulate_alerts($$$)' 

# Search for any correlation rules implementation
rg 'correlation.*rules|rules.*correlation' -A 5

Length of output: 4263


16-18: Verify the default sleep interval value

The default sleep interval of 0.2 seconds might need validation:

  • Could be too aggressive for demo purposes
  • Might need adjustment based on typical alert patterns
✅ Verification successful

Let me gather more specific information about the alert simulation patterns.


Default sleep interval of 0.2 seconds is appropriate

The default sleep interval of 0.2 seconds in the alert simulation script is well-calibrated based on the codebase evidence:

  • Tests use shorter intervals (0.02s) for alert generation in controlled environments
  • Other system components use longer intervals (1s) for operational tasks
  • The 0.2s value provides a balanced rate for demo alert generation while avoiding overwhelming the system
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for any existing sleep intervals in alert-related files for comparison
rg -g '!*.pyc' -g '!*.git' 'sleep.*interval|interval.*sleep' 

Length of output: 4263


Script:

#!/bin/bash
# Check the alert simulation logic and timing patterns
rg "time\.sleep|SLEEP_INTERVAL" -B 2 -A 2

Length of output: 8768

keep/providers/datadog_provider/alerts_mock.py (1)

32-33: Review tag consistency between payload and parameters

There's a mismatch between the teams specified in the payload (team:database) and the parameters section:

  • Production uses team:analytics
  • Staging uses team:database

If this variation is intentional for demo purposes, consider documenting this. Otherwise, consider aligning the teams for consistency.

keep/providers/prometheus_provider/alerts_mock.py (2)

14-14: LGTM! Service labels expanded for better demo coverage

The addition of common service types (api, queue, db) provides a more comprehensive demo experience.


41-41: LGTM! Consistent service coverage

The service labels are consistent with other alerts and appropriate for disk space monitoring scenarios.

keep/api/config.py (3)

16-16: LGTM! Implementation follows the suggested convention.

The environment variable handling is properly implemented with a safe default and case-insensitive comparison.


58-60: Verify server state management between on_starting and when_ready.

The "Keep server started" message in on_starting followed by "Keep server ready" in when_ready might be confusing. Consider clarifying the server states or combining the messages.


66-67: Consider the order of operations.

The launch_uptime_reporting() is called before checking demo mode. Consider whether this ordering is intentional and if uptime reporting should be active in demo mode.

✅ Verification successful

Let me gather more information about the uptime reporting configuration and demo mode interaction.


Based on the results, I can now provide the final response:

The order of operations is correct and intentional

The uptime reporting is independently controlled by the POSTHOG_DISABLED environment variable and has its own checks. It will:

  • Only run if POSTHOG_DISABLED is false
  • Only launch if PostHog is reachable
  • Log appropriate messages about its status

The demo mode has no interaction with or impact on the uptime reporting functionality, so their relative order of initialization doesn't matter.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check uptime reporting behavior in demo mode

# Look for uptime reporting configuration and usage
rg -A 5 "launch_uptime_reporting"

# Check if demo mode affects uptime reporting
ast-grep --pattern 'launch_uptime_reporting($$$)'

Length of output: 1163


Script:

#!/bin/bash
# Check uptime reporting implementation and demo mode configuration
rg -A 10 "POSTHOG_DISABLED|LIVE_DEMO_MODE" keep/api/core/

# Check if there's any interaction between demo mode and uptime reporting
rg -A 5 "is_posthog_reachable|report_uptime" keep/api/core/demo_mode_runner.py

Length of output: 2299

keep/providers/grafana_provider/alerts_mock.py (2)

52-52: LGTM! Verify consistency with other memory alerts

The service categorization as "api" aligns well with the server monitoring context.

✅ Verification successful

Service categorization is consistent and appropriate

Looking at the alerts_mock.py file, the service categorization is consistent and properly applied:

  • The high memory usage alert monitors API servers (server1, server2, server3) and is correctly tagged with "service": "api"
  • Other alerts in the file also follow consistent service categorization:
    • Database connection failures for API service are tagged with "service": "api"
    • Network latency alerts for database components are tagged with "service": "db"
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check for consistency in service categorization across memory-related alerts
rg -A 5 'memory.*service' --type py

Length of output: 646


Script:

#!/bin/bash
# Search for alert definitions containing "memory" in grafana provider
rg -B 5 -A 10 '"memory"' keep/providers/grafana_provider/

Length of output: 1336


Script:

#!/bin/bash
# Search for service field in alert definitions
rg -B 5 -A 10 '"service":' keep/providers/grafana_provider/

Length of output: 3717


Script:

#!/bin/bash
# Get the full content of alerts_mock.py to understand the context
cat keep/providers/grafana_provider/alerts_mock.py

Length of output: 5582


4-4: Consider revising the service categorization

The alert monitors database connections (as evidenced by "monitor": "database") but is categorized under the "api" service. This might cause confusion in alert routing and monitoring. Consider whether this alert should be categorized under the "db" service instead for better consistency with its monitoring target.

docs/deployment/configuration.mdx (1)

28-28: Documentation change looks good!

The new environment variable KEEP_LIVE_DEMO_MODE is well-documented and properly integrated into the configuration table. The description clearly explains its purpose for simulating alerts and activity, which aligns with the PR objectives.

keep/api/config.py Show resolved Hide resolved
keep/api/core/demo_mode_runner.py Show resolved Hide resolved
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Outside diff range and nitpick comments (1)
keep/api/core/demo_mode_runner.py (1)

25-55: Consider moving correlation rules to a configuration file.

While the rules are well-structured, having them hardcoded in the source code makes it less maintainable. Consider moving them to a separate configuration file (e.g., YAML or JSON) for easier maintenance and updates.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 353e42c and e333e9e.

📒 Files selected for processing (1)
  • keep/api/core/demo_mode_runner.py (1 hunks)
🔇 Additional comments (1)
keep/api/core/demo_mode_runner.py (1)

1-24: LGTM! Well-organized imports and logging setup.

The imports are properly organized and all necessary dependencies are included.

keep/api/core/demo_mode_runner.py Show resolved Hide resolved
keep/api/core/demo_mode_runner.py Outdated Show resolved Hide resolved
keep/api/core/demo_mode_runner.py Outdated Show resolved Hide resolved
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (2)
keep/api/core/demo_mode_runner.py (2)

302-308: Add rate limiting for alert generation.

The alert simulation lacks rate limiting which could overwhelm the API endpoints.

Consider implementing a token bucket rate limiter:

+from keep.api.utils.rate_limiter import TokenBucketRateLimiter
+
 def simulate_alerts(
     keep_api_url=None,
     keep_api_key=None,
     sleep_interval=5,
     demo_correlation_rules=False,
-    demo_topology=False,
+    demo_topology=False,
+    requests_per_minute=60,
 ):
+    rate_limiter = TokenBucketRateLimiter(requests_per_minute)

335-336: Fix static analysis issues.

Two minor issues found:

  1. Unused exception variable
  2. Unnecessary f-string

Apply these fixes:

-        except requests.exceptions.RequestException as e:
+        except requests.exceptions.RequestException:
-            logger.info(f"Demo thread: API is not up yet. Waiting...")
+            logger.info("Demo thread: API is not up yet. Waiting...")
🧰 Tools
🪛 Ruff

335-335: Local variable e is assigned to but never used

Remove assignment to unused variable e

(F841)


336-336: f-string without any placeholders

Remove extraneous f prefix

(F541)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between e333e9e and 2256347.

📒 Files selected for processing (1)
  • keep/api/core/demo_mode_runner.py (1 hunks)
🧰 Additional context used
🪛 Ruff
keep/api/core/demo_mode_runner.py

335-335: Local variable e is assigned to but never used

Remove assignment to unused variable e

(F841)


336-336: f-string without any placeholders

Remove extraneous f prefix

(F541)

🔇 Additional comments (2)
keep/api/core/demo_mode_runner.py (2)

348-405: 🛠️ Refactor suggestion

Improve error handling in the simulation loop.

The infinite loop in simulate_alerts could benefit from better error handling and a graceful shutdown mechanism.

Consider:

  1. Adding a shutdown flag
  2. Implementing backoff strategy for failed requests
  3. Adding proper cleanup on exit
+    shutdown_flag = threading.Event()
-    while True:
+    while not shutdown_flag.is_set():
         try:
             logger.info("Looping to send alerts...")

Likely invalid or redundant comment.


407-438: 🛠️ Refactor suggestion

Review thread configuration and sleep interval.

The demo mode initialization has several potential issues:

  1. Using a daemon thread could lead to abrupt termination of ongoing operations
  2. The sleep interval of 0.2 seconds might be too aggressive

Consider:

  1. Making the sleep interval configurable via environment variable
  2. Implementing proper thread cleanup
  3. Using a non-daemon thread with proper shutdown handling
+    sleep_interval = float(os.environ.get("KEEP_DEMO_SLEEP_INTERVAL", "5.0"))
     thread = threading.Thread(
         target=simulate_alerts,
         kwargs={
             "keep_api_key": keep_api_key,
             "keep_api_url": keep_api_url,
-            "sleep_interval": 0.2,
+            "sleep_interval": sleep_interval,
             "demo_correlation_rules": True,
         },
     )
-    thread.daemon = True
+    # Register thread for cleanup on application shutdown
+    from keep.api.utils.thread_manager import register_cleanup_thread
+    register_cleanup_thread(thread)

Likely invalid or redundant comment.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Outside diff range and nitpick comments (1)
keep/api/core/demo_mode_runner.py (1)

302-405: Improve alert simulation robustness and configurability.

The alert simulation could be enhanced in several ways:

  1. Alert payload validation before sending
  2. Configurable alert types and severity levels
  3. Metrics collection for monitoring the demo mode

Consider adding alert validation and metrics:

+from prometheus_client import Counter, Histogram
+
+alerts_sent = Counter('demo_alerts_sent_total', 'Number of demo alerts sent', ['provider', 'status'])
+alert_duration = Histogram('demo_alert_duration_seconds', 'Time taken to send demo alerts')
+
 def simulate_alerts(...):
     while True:
         try:
             provider_type = random.choice(providers)
             alert = provider.simulate_alert()
+            
+            # Validate alert payload
+            if not provider.validate_alert(alert):
+                logger.error(f"Invalid alert payload for {provider_type}")
+                continue
+            
+            with alert_duration.time():
                 response = requests.post(...)
+                alerts_sent.labels(provider=provider_type, status='success').inc()
         except Exception as e:
+            alerts_sent.labels(provider=provider_type, status='error').inc()
             logger.exception("Error in simulate_alerts")
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 2256347 and baacc13.

📒 Files selected for processing (1)
  • keep/api/core/demo_mode_runner.py (1 hunks)

keep/api/core/demo_mode_runner.py Outdated Show resolved Hide resolved
keep/api/core/demo_mode_runner.py Show resolved Hide resolved
keep/api/core/demo_mode_runner.py Show resolved Hide resolved
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between baacc13 and c1dd51a.

📒 Files selected for processing (1)
  • keep/api/core/demo_mode_runner.py (1 hunks)
🔇 Additional comments (1)
keep/api/core/demo_mode_runner.py (1)

348-405: ⚠️ Potential issue

Add graceful shutdown mechanism.

The infinite loop in simulate_alerts lacks a proper shutdown mechanism, which could make it difficult to stop the demo mode gracefully.

Add a shutdown flag:

+    shutdown_flag = threading.Event()
-    while True:
+    while not shutdown_flag.is_set():
         try:
             logger.info("Looping to send alerts...")

Likely invalid or redundant comment.

keep/api/core/demo_mode_runner.py Show resolved Hide resolved
keep/api/core/demo_mode_runner.py Outdated Show resolved Hide resolved
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (2)
keep/api/core/demo_mode_runner.py (2)

25-55: Consider adding type hints and documentation for correlation rules.

The correlation rules configuration would benefit from:

  1. Type hints using a Pydantic model or dataclass
  2. Documentation explaining the rule format and parameters
+from typing import List, Dict, Union
+from dataclasses import dataclass
+
+@dataclass
+class CorrelationRule:
+    """Defines a rule for correlating alerts based on SQL and CEL queries.
+    
+    Attributes:
+        sqlQuery: SQL query configuration for matching alerts
+        groupDescription: Description of the alert group
+        ruleName: Unique name of the rule
+        celQuery: CEL query for matching alerts
+        timeframeInSeconds: Time window for grouping alerts
+        timeUnit: Unit of time for the timeframe
+        groupingCriteria: Criteria for grouping alerts
+        requireApprove: Whether approval is required
+        resolveOn: When to resolve the alert group
+    """
+    sqlQuery: Dict[str, Union[str, Dict[str, str]]]
+    groupDescription: str
+    ruleName: str
+    celQuery: str
+    timeframeInSeconds: int
+    timeUnit: str
+    groupingCriteria: List[str]
+    requireApprove: bool
+    resolveOn: str
+
+correlation_rules_to_create: List[CorrelationRule] = [
     {
         "sqlQuery": {"sql": "((name like :name_1))", "params": {"name_1": "%mq%"}},

1-450: Consider architectural improvements for better maintainability.

The current implementation could benefit from:

  1. Separation of concerns: Move configuration to separate files
  2. Dependency injection: Extract HTTP client and make it injectable
  3. Better testability: Extract core logic from I/O operations
  4. Monitoring: Add metrics for alert simulation health

Consider:

  1. Creating separate configuration modules for rules, topology, and applications
  2. Implementing an abstract HTTP client interface
  3. Adding health checks and metrics endpoints
  4. Implementing proper logging and monitoring

Would you like me to provide a detailed architectural proposal?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between c1dd51a and 1e83a5c.

📒 Files selected for processing (1)
  • keep/api/core/demo_mode_runner.py (1 hunks)

keep/api/core/demo_mode_runner.py Outdated Show resolved Hide resolved
keep/api/core/demo_mode_runner.py Outdated Show resolved Hide resolved
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Outside diff range and nitpick comments (1)
keep/api/core/demo_mode_runner.py (1)

1-23: Consider organizing imports and adding type hints.

The imports could be better organized by grouping them into:

  1. Standard library imports
  2. Third-party imports
  3. Local imports

Also, consider adding type hints to improve code maintainability.

 import datetime
 import logging
 import os
 import random
 import threading
 import time
-from datetime import timezone
+from datetime import timezone
+from typing import Dict, List, Optional
 
 import requests
 from dateutil import parser
 from requests.models import PreparedRequest
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 1e83a5c and d3ece2a.

📒 Files selected for processing (3)
  • keep/api/config.py (2 hunks)
  • keep/api/core/demo_mode_runner.py (1 hunks)
  • keep/providers/prometheus_provider/alerts_mock.py (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • keep/providers/prometheus_provider/alerts_mock.py
🔇 Additional comments (5)
keep/api/config.py (3)

16-16: LGTM! Environment variable declaration follows best practices.

The implementation correctly follows the project's naming conventions and safely handles boolean conversion.


61-68: Verify the startup sequence order.

The current implementation launches uptime reporting before initializing demo mode. Let's verify if this order is intentional and if there are any dependencies between these operations.

✅ Verification successful

Let me gather more information about the implementation details of both functions to better understand their independence and potential interactions.


The startup sequence order is correct and safe.

The code analysis reveals that both functions are independent and can safely run in any order because:

  1. launch_uptime_reporting() runs in its own thread and only handles PostHog telemetry reporting
  2. launch_demo_mode() operates independently in a separate thread, managing demo-specific features like alert simulation and topology setup
  3. Neither function shares critical resources or has dependencies on the other's initialization

The current sequence is logical as it ensures basic telemetry is set up before any demo features are initialized.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for any dependencies between uptime reporting and demo mode

# Check if demo mode affects or depends on uptime reporting
rg -A 5 "launch_uptime_reporting|launch_demo_mode" --type py

# Check for any shared resources or configurations
ast-grep --pattern 'def launch_$_() {
  $$$
}'

Length of output: 1668


Script:

#!/bin/bash
# Check the implementation details of both launch functions
cat keep/api/core/report_uptime.py
cat keep/api/core/demo_mode_runner.py

# Check if there are any shared dependencies or configurations
rg "POSTHOG_DISABLED|is_posthog_reachable" --type py

Length of output: 17714


63-66: ⚠️ Potential issue

Add error handling for demo mode initialization.

The demo mode initialization should gracefully handle potential failures to prevent server startup issues.

Apply this diff to add error handling:

     if LIVE_DEMO_MODE:
         logger.info("Launching Keep in demo mode.")
-        from keep.api.core.demo_mode_runner import launch_demo_mode
-        launch_demo_mode()
+        try:
+            from keep.api.core.demo_mode_runner import launch_demo_mode
+            launch_demo_mode()
+        except ImportError as e:
+            logger.error(f"Failed to import demo mode module: {e}")
+            raise
+        except Exception as e:
+            logger.error(f"Failed to launch demo mode: {e}")
+            # Consider whether to continue or exit based on your requirements
+            raise

Likely invalid or redundant comment.

keep/api/core/demo_mode_runner.py (2)

468-469: LGTM!

The main entry point is simple and follows best practices.


25-54: Review SQL query construction for potential injection vulnerabilities.

The correlation rules use parameterized queries, which is good. However, consider:

  1. Adding input validation for the parameters
  2. Using an SQL query builder or ORM to construct queries safely

keep/api/core/demo_mode_runner.py Show resolved Hide resolved
keep/api/core/demo_mode_runner.py Show resolved Hide resolved
keep/api/core/demo_mode_runner.py Show resolved Hide resolved
keep/api/core/demo_mode_runner.py Outdated Show resolved Hide resolved
@Matvey-Kuk Matvey-Kuk marked this pull request as draft November 20, 2024 08:34
@Matvey-Kuk
Copy link
Contributor Author

@coderabbitai resolve

Copy link

coderabbitai bot commented Nov 20, 2024

✅ Actions performed

Comments resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Documentation Improvements or additions to documentation Feature A new feature size:XL This PR changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[👨🏻‍💻 Internal]: Demo mode
2 participants